Multi-Agent Model-Based Reinforcement Learning Experiments in the Pursuit Evasion Game
نویسندگان
چکیده
This paper describes multi-agent learning experiments performed on tactical sequences of the pursuit evasion game on very small grids. It underlines the performance difference between a centralized approach and a distributed approach when using Rmax, a model-based reinforcement learning algorithm. The prey’s goal is to go out of the grid and the predators’ goal is to kill the prey. The prey may learn or not. The predators learn in two ways: in the centralized approach they are part of one single learning agent, and in the distributed approach, each predator is a learning agent in itself. Every agents learn to accomplish its goal by using Rmax. Our results compare the centralized approach with the distributed approach. Future works mainly include scaling up to larger boards using model-free algorithms, and exploring partial observability of agents.
منابع مشابه
Modeling the Behaviour of Interacting Autonomous Intelligent Agents
Initial research is focused on building and comparing online/off-line navigation behaviours in autonomous situated agents. Three soft controllers will be evaluated using: Fuzzy reactive subsumption architecture. Deliberative evolutionary module using Genetic Algorithms (Collins et al. 1998a). Reinforcement learning using temporal difference (TD) methods (Millan 1996) i.e. Sarsa-learning. Amongs...
متن کاملPolicy Learning in Imperfect-information Infinite Dynamic Games
Dynamic games (DGs) play an important role in distributed decision making and control in complex environments. Finding optimal/approximate solutions for these games in the imperfect-information setting is currently a challenge for mathematicians and computer scientists, especially when state and action spaces are infinite. This paper presents an approach to this problem by using multi-agent rei...
متن کاملCooperative Cognitive Agents and Reinforcement Learning in Pursuit Game
This paper illustrates how a self-organizing cognitive architecture, known as TD-FALCON, can learn to function and cooperate in a dynamic environment. TD-FALCON learns the value functions of the stateaction space estimated through a temporal difference (TD) method. The learned value functions are then used to determine the optimal actions based on an action selection policy. To tackle a multi-a...
متن کاملAnytime algorithms for multi-agent visibility-based pursuit-evasion games
We investigate algorithms for playing multi-agent visibilitybased pursuit-evasion games. A team of pursuers attempts to maintain visibility contact with an evader who actively avoids tracking. We aim for applicability of the algorithms in real-world scenarios; hence, we impose hard constraints on the run-time of the algorithms and we evaluate them in a simulation model based on a real-world urb...
متن کاملMemory Based Learning of Pursuit Games
Combining diierent machine learning algorithms in the same system can produce beneets above and beyond what either method could achieve alone. This paper demonstrates that memory based learning can be used in conjunction with genetic algorithms to solve a diicult class of delayed reinforcement learning problems that both methods have trouble solving individually. This class, the class of diiere...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008